Take-home Exercise 4

In this Take-home Exercise, I will explore to reveal the daily routine of two selected participants of the city of Engagement, Ohio USA.

Che Xuan https://www.linkedin.com/in/jacob-che-xuan-b646a9123/
2022-05-22

Task

Challenge 2: Patterns of Life considers the patterns of daily life throughout the city. You will describe the daily routines for some representative people, characterize the travel patterns to identify potential bottlenecks or hazards, and examine how these patterns change over time and seasons.

In Challenge 2, you will use visual analytic techniques to address these questions:

Overview

In this take-home exercise, ViSiElse and other appropriate visual analytics methods are used to reveal the life patterns of the city of Engagement, Ohio USA while addressing the questions stated in the Task section.

The data are processed by using appropriate tidyverse family of packages.

Sketch of Proposed Design

The picture below shows a sketch of the initial design proposed.

Installing & Launching R Packages

Before we get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.

The chunk code below will do the trick.

packages = c('tidyverse', 'data.table', 'knitr', 'ViSiElse', 'kableExtra', 'lubridate', 'ggplot2', 'ggiraph', 'plotly', 'DT', 'patchwork','crosstalk')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

Importing Data

The code chunk below imports ActivityLogs files from the rawdata folder into R by using list.files(), read_csv() of readr package, and map_df() of purrr package, and and save it as a tibble data frame called logs.

memory.limit(size=160000)
logs <- list.files(path = "./rawdata/ActivityLogs/",
                   pattern = "*.csv", 
                   full.names = T) %>% 
  map_df(~read_csv(., 
                   col_types = cols(.default = "c"))) 

Data Wrangling

During our data exploration, we realized some of the participants starts off with unstable financial status while some having stable financial status. As such, we will choose 1 participant each, with partipantId 4 from the unstable financial status group and partipantId 7 from the stable financial status group, and to look after their differing routines.

The code chunk below limits our logs data to the only selected participants using filter() function.

logs_two_p <- logs %>%
  filter(participantId == '4' | participantId == '7')

Data frame logs_two_p is saved in RDS format to avoid uploading large files to Git.

saveRDS(logs_two_p, 'data/logs_two_p.rds')
logs_two_p <- readRDS('data/logs_two_p.rds')

We will use kable() to review the structure of the data frame and kable_classic of kableExtra has been applied to ensure the full width of the table can be preserved.

kable(head(logs_two_p), format = "html", table.attr = "style='width:40%;'" ) %>% 
  kable_classic(full_width = T, position = "center", )
timestamp currentLocation participantId currentMode hungerStatus sleepStatus apartmentId availableBalance jobId financialStatus dailyFoodBudget weeklyExtraBudget
2022-03-01T00:00:00Z POINT (976.2409614204214 4574.575079082071) 4 AtHome JustAte Sleeping 194 -681.6505879983722 177 Unstable 20 0
2022-03-01T00:00:00Z POINT (-1023.8165705255449 1578.3713681439597) 7 AtHome JustAte Sleeping 97 683.8632217708105 270 Stable 20 1164.0690586556877
2022-03-01T00:05:00Z POINT (976.2409614204214 4574.575079082071) 4 AtHome JustAte Sleeping 194 -681.6505879983722 177 Unstable 20 0
2022-03-01T00:05:00Z POINT (-1023.8165705255449 1578.3713681439597) 7 AtHome JustAte Sleeping 97 683.8632217708105 270 Stable 20 1164.0690586556877
2022-03-01T00:10:00Z POINT (976.2409614204214 4574.575079082071) 4 AtHome JustAte Sleeping 194 -681.6505879983722 177 Unstable 20 0
2022-03-01T00:10:00Z POINT (-1023.8165705255449 1578.3713681439597) 7 AtHome JustAte Sleeping 97 683.8632217708105 270 Stable 20 1164.0690586556877

We will add date and year_month columns for further analysis.

logs_two_p$date <- date(logs_two_p$`timestamp`)
logs_two_p$year_month <- format(as.Date(logs_two_p$`timestamp`), "%Y-%m")

Let’s look at the different activities from currentMode:

[1] "AtHome"       "Transport"    "AtWork"       "AtRestaurant"
[5] "AtRecreation"

We will now calculate the duration of each activity in seconds and transform data using pivot_wider() of tidyr to suit the usage for visielse() visuals.

logs_daily_ss <- logs_two_p %>%
  group_by(date, participantId, currentMode) %>%
  summarise(count = n()) %>%
  mutate(ss = count * 5 *60) %>%
  ungroup() %>%
  select(date, participantId, currentMode, ss) %>%
  pivot_wider(names_from=currentMode, values_from=ss,
              values_fill=0)
date participantId AtHome AtRestaurant AtWork Transport AtRecreation
2022-03-01 4 54600 1200 28800 1800 0
2022-03-01 7 28200 2400 29100 21300 5400
2022-03-02 4 53700 1200 29100 2400 0
2022-03-02 7 27600 1200 29100 17100 11400
2022-03-03 4 33000 1200 29700 12300 10200
2022-03-03 7 27600 2400 29100 22500 4800

ViSiElse Plot

Before plotting, we will set an order to the activities.

logs_daily_ss <- logs_daily_ss[,c(1,2,5,4,3,6)]
visielse(logs_daily_ss, group = logs_daily_ss$participantId, informer = NULL, method = "cut", pixel = 1000)

-parameters 
 method           :  cut 
 grwithin         :   
 quantity         :  N 
 informer         :   
 tests            :  FALSE 
 threshold.test   :  0.01 
 pixel            :  1000 
 t_0              :  0 
-MATp             : 10 x 89 sparse Matrix of class "dgCMatrix" 
-L                : 0 x 0  data.frame 
-idsort           : 0 x 0 matrix 
-MATpsup          : 0 x 0 sparse Matrix of class "dgCMatrix" 
-idsup            : length  0 vector 
-Lsup             : 0 x 0  data.frame 
-colvect          : 2 x 1 matrix 
-BZL              : 0 x 0 sparse Matrix of class "dgCMatrix" 
-book             : 5 x 6  ViSibook 
-group            : length  900 factor 
-vect_tps         : length  89 vector 
-testsP           : length  0 vector 
-informers        : 0 x 0 matrix 

We realize that in general Participant 4 is spending less time in restaurant and at recreation than Participant 7, possibly due to his unstable financials status. This is more obvious in the difference in time spent for recreation where Participant 4 might want to restrict his non-necessity related spending in order to improve his financial outlook. This also aligns with the time spent at home where Participant * spent much more time at home than Participant 7. We also spot that both participants are having the same amount of time spent at work. As for transport perspective, Participant 4 is spending less time than Participant 7, potentially due to a closer location from home.

Static Trellis Plot

We have transformed the data accordingly and displayed a trellis plot partitioned by currentMode

qplot(year_month, hour, data = logs_two_p_count, color = participantId) +
  facet_wrap(~ currentMode) +
  labs(y= 'Hour', x= 'Time',
     title = "Trend for Respective Activities Over Time") +
  guides(color = guide_legend(title = "Participant Id")) +
  theme(axis.title.y= element_text(angle=0), axis.ticks.x= element_blank(), axis.text.x=element_blank(),
      axis.line= element_line(color= 'grey'))

We can see that despite a few of the irregular patterns, generally the time spent on each activity follow similar trends for both participant 4 and participant 7.

Interactive Trellis Plot

In order to add interactivity to the previous trellis plot, we will use ggplotly() of plotly and bscols() of crosstalk.

dd <- highlight_key(logs_two_p_count)

graph1 <- qplot(year_month, hour, data = logs_two_p_count, color = participantId) +
  facet_wrap(~ currentMode) +
  labs(y= 'Hour', x= 'Time',
     title = "Trend for Respective Activities Over Time") +
  guides(color = guide_legend(title = "Participant Id")) +
  theme(axis.title.y= element_text(angle=0), axis.ticks.x= element_blank(), axis.text.x=element_blank(),
      axis.line= element_line(color= 'grey'))

gg <- highlight(ggplotly(graph1),
                "plotly_selected")

crosstalk::bscols(gg, 
                  widths = c(12, 12),
                  DT::datatable(dd,
                                rownames = FALSE,
                                colnames = c('Year Month' = 'year_month',
                                             'Participant Id' = 'participantId',
                                             'Current Mode' = 'currentMode',
                                             'Hour' = 'hour'),
                                filter = 'top',
                                class = 'display'))